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The origin of black hole entropy and the black hole information problem provide important clues 
for trying to piece together a quantum theory of gravity. Thus far, discussions on this topic have 
mostly assumed that in a consistent theory of gravity and quantum mechanics, quantum theory 
will be unmodified. Here, we examine the black hole information problem in the context of gen- 
eralisations of quantum theory. In particular, we examine black holes in the setting of generalised 
probabilistic theories, in which quantum theory and classical probability theory are special cases. 
We compute the time it takes information to escape a black hole, assuming that information is 
preserved. We find that under some very general assumptions, the arguments of Page (that infor- 
mation should escape the black hole after half the Hawking photons have been emitted), and the 
black-hole mirror result of Hayden and Preskill (that information can escape quickly) need to be 
modified. The modification is determined entirely by what we call the Wootters-Hardy parameter 
associated with a theory. We find that although the information leaves the black hole after enough 
photons have been emitted, it is fairly generic that it fails to appear outside the black hole at this 
point - something impossible in quantum theory due to the no-hiding theorem. The information is 
neither inside the black hole, nor outside it, but is delocalised. Our central technical result is an 
information decoupling theorem which holds in the generalised probabilistic framework. 

One of the central projects of theoretical physics is to construct a theory in which both gravity and quantum 
mechanics can be combined consistently. In doing so, we have very little to guide us in the way of experiments. 
However, the black hole does provide a model system which can aid us in this task. From the work of Bekenstein 
and Hawking, black holes possess a thermodynamical entropy which is usually attributed to quantum gravitational 
microstates. A theory of quantum gravity ought to pass the test of predicting that the entropy of a black hole is given 
by a quarter of the black hole's area [BE]- Likewise, a theory of quantum gravity must navigate its way through the 
black hole information problem [3HS]. Namely, either the theory preserves information (as in quantum theory), in 
which case it ought to explain how information apparently is able to escape the black hole horizon. Or, if information 
is destroyed, it must explain how this can happen while still apparently preserving conservation laws 6 8 . 

However, thus far, the discussion on the black hole information problem has largely been within the context 
of theories in which quantum mechanics is unmodified. This seems an undue restriction, given that the central 
motivation for studying black hole thermodynamics and information is that it can lead us to other consistent theories 
which are experimentally compatible with gravity and quantum theory. If studying black holes is going to allow us 
to explore what form a theory of quantum gravity might take, then we shouldn't be confining ourselves to theories 
in which quantum theory is unmodified. The aim of the present article is to begin to remedy this, by expanding 
the discussion to include generalisations of quantum theory, in the hope that it will allow us to explore black hole 
information in a more robust setting. 

We will take as our starting point the assumption that the consistent theory of nature fits in the class of generalised 
probabilistic theories (GPTs). This is a very unrestrictive framework. Quantum theory and classical theory are but 
special cases of GPTs. Crucially, any theory whose operational output is the probabilities of outcomes of measure- 
ments, conditional on a choice of system preparation and subsequent transform, can be formulated in this way [§]• 
GPTs also have a natural notion of reversible time evolution, generalising the unitary time evolution of quantum 
mechanics. GPTs are currently being studied extensively in quantum information theory, since examples of GPTs 
exist that exhibit interesting information-theoretic behaviour that deviates from quantum theory, such as superstrong 
nonlocality [TDHT5] . We recommend [51 EH HI] as background for readers unfamiliar with GPTs. 

We will take as an axiom that information is preserved in such theories in that time evolution is reversible (akin 
to unitarity in quantum theory), and then use this framework to study information in black holes. In particular, 
what is of interest is the tension between information preservation and Hawking's calculation, which suggests that at 
least semi-classically, a black hole radiates information thermally, apparently destroying information. If information 
is preserved, and escapes from a black hole before quantum gravitational effects come into play, then one can find a 
set of space-like hypersurfaces, such that information appears to be cloned [T3]- Cloning is not possible in quantum 



2 



theory or any other generalised probabilistic theory apart from classical probability theory [IB]. On the other hand, 
if Hawking's semi-classical calculation holds until the black hole is of the Planck mass, and quantum gravitational 
effects come into play, then one effectively has a long lived black hole remnant with a lot of information stored inside, 
and this presents a host of associated difficulties [5J [T?HT§] . 

The speed at which information leaves the black hole is thus an important question. In quantum theory, three 
cases of particular interest here have been considered: (i) for the case of a black hole which is initially in a pure 
quantum state, Page argued that if information is preserved, then, under certain assumptions, it would have to start 
escaping when half the photons had been emitted [20j . and thus the theory needs to find a way around the cloning 
argument. Black hole complementarity [T71[STJ[S2] is one such mechanism to avoid the cloning argument, (ii) classical 
information on the other hand, can be locked inside a black hole until the very end of the evaporation process [23] 
without suffering from the problems usually associated with remnants, (iii) the same mechanism which produces the 
locking of the classical information, will also cause a black hole to emit its information almost instantly, as if it is 
a mirror, in the case where the state of the black hole is initially entangled with the outside [24]. This later result 
pushes any mechanism which avoids the cloning argument to its very limits, and has possible implications for the 
amount of time it takes for black holes to scramble information [55J [2BJ ■ 

In this paper, we study scenarios (i) and (iii) in the general setting beyond quantum theory. We compute the 
corresponding information retrieval properties for all GPTs that satisfy some natural assumptions, including quantum 
and classical probability theory as special cases. It turns out that post-quantum theories behave quantitatively 
different from quantum theory: in Page's scenario (i), the black hole may emit much more than half of the photons 
until information escapes. Hayden and Preskill's mirror result for case (iii) remains valid qualitatively, but with 
an interesting difference. To analyze this result in the generalized context, we prove a version of the decoupling 
theorem [25] [271 HE] for GPTs. A decoupling theorem essentially tells us how easy it is to remove the correlations 
from a state. In quantum theory, the decoupling theorem would tell us how quickly information generically leaves a 
black hole, and also, how quickly this information appears outside the black hole. This is because in quantum theory, 
information cannot be encoded in the correlations between the black hole and the radiation, but must reside almost 
entirely in one or the other. Thus, if information cannot be found inside the black hole any more, it must be localised 
outside of it. In the context of quantum information theory, this is a consequence of decoupling [37], and in the 
context of black holes, it is the no-hiding theorem |30p. Its approximate version is related to Uhlmann's theorem |29] 
(see [27]). 

We find that for GPTs, the mirror results remains valid, but the no-hiding theorem does not hold in general. This 
leads to the intriguing possibility that information can escape the black hole quickly, but not be found outside of it, 
instead becoming delocalised. If the information is delocalised for a long enough time, then this could potentially 
serve as an alternative to black-hole complementarity, as it avoids the problem of there being a hypersurface in which 
there is a copy of the information both inside and outside the black hole. 

We start by first describing the general class of theories we consider, and then the physical situation of black hole 
evaporation as recast in [23] for the purpose of an information-theoretic analysis. Our central technical result is 
Theorem [Tj proven in the Appendix. After introducing it, we apply it to scenarios (i) and (iii) above and contrast our 
results for post-quantum theories to the known quantum results. Our main conclusion is that for generic potential 
generalisations of quantum theory one can have preservation of information, but the rate at which information leaves 
the black hole is modified. In particular, information can escape very late in the evaporation process, and can even 
be delayed until the point when the black hole is no longer semi-classical, thus respecting the semi-classical result of 
Hawking, yet without resulting in the problems associated with information crossing a causal horizon. Likewise, the 
fact that information can become delocalised in such theories could potentially be used as an alternative to black-hole 
complementarity. 

I. GENERAL PROBABILISTIC THEORIES 

In General Probabilistic Theories (GPTs), one assigns states ui to any physical system (in quantum theory, this 
would be the density matrix p). If A is a physical system, the set of all states, the state space, will be denoted Qa] 
it can always be chosen as some subset of W l with suitable n. By assumption, it is possible to prepare either some 
state uj with probability p, or some state tp with probability 1 — p, yielding pu + (I —p)<p. Thus, every state space £Ia 
is convex, and for further physical reasons compact. This is also true for quantum n-level systems, where the state 
space is the convex set ofnxn density matrices. Similarly as in quantum theory, we call a state mixed if it can be 
written in the form pu> + (I — p)ip for some < p < 1 and to =/= tp, and otherwise pure. Thus far, (GPTs) have only 
been studied in the context of describing a physical system which exists in space-time - they have not been applied 
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to the description of space-time itself. While it is likely that a proper account of black hole information will need to 
concern itself with describing space-time, such considerations are clearly beyond our current understanding. 

Imagine some measurement with k outcomes. Applying it to some state w, the probability to obtain specifically 
the first outcome can be denoted e\{u3), which is a real number in the interval [0, 1]. If we feed a statistical mixture 
into the measurement device, we get the probability e\(pu> + (1 — p)ip) = pei(ui) + (1 — p)ei((p). Thus, e\ is a linear 
map which is non-negative on all states - we call these maps effects. The further measurement outcomes are similarly 
described by effects e%, . . . , e& such that the total probability is J2i e i( w ) = 1- 111 quantum theory, if w is a density 
matrix, every effect e has the form e(w) = tr(Pw), with some matrix < P < 1 (e.g. a projector). 

Transformations must map states to states - since they must respect statistical mixtures, they must be linear. In 
the following, we are only interested in reversible transformations T, that is, ones that have an inverse transformation 
T _1 and thus do not destroy information. We do not consider transformations which destroy information, since the 
entire crux of the black hole information problem is the question of whether information preserving transformations 
are consistent with what we know about black holes. To every physical system A, there is a compact (possibly finite) 
group of reversible transformations Qa- In quantum theory, these are the unitaries, p \— > UpW . They are symmetries 
of the state space. 

Figure [I] gives an example of a GPT state space other than quantum theory [T3] . 



FIG. 1: A simple example of GPT state spaces other than quantum theory. The inner circle is the equatorial plane of the 
quantum theory Bloch sphere, with all states on the circle pure (|±) := ^=([0)±|1})). One may alternatively consider the outer 
square as the state space, in which case there are only four pure states, wi, . . . , u>4. As in any theory in the GPT framework, 
any convex combination of states is allowed, implying that any point in the square must be an allowed state. This outer square 
state space, which contains the quantum states, is a non-quantum example of a GPT state space, and is called a 'gbit' [13] . 
The only possible reversible transformations would be rotations by multiples of 90° and reflections across the center. 

In the following calculations, it turns out that two quantities will be of paramount importance that have first been 
introduced by Wootters and Hardy |3T]. Given some system A, we denote by Ka the dimension of the set of 
unnormalized states; that is, Ka = dim(fiyi) + 1, because VLa is the set of normalized states. Furthermore, we denote 
by Na the maximal number of states that are perfectly distinguishable in a single measurement. 

If A is a quantum n-level system, then we can perfectly distinguish at most n (orthogonal) states, hence Na = n. 
In general, states u>i, . . . ,u> n are perfectly distinguishable if there is an n-outcome measurement with effects ei, . . . , e n 
such that ej(coj) = §ij. In quantum theory, Qa is the set of n x n-density matrices; hence Ka = dirn^^) + 1 — n 2 ; 
this is the number of independent real parameters in an unnormalized density matrix. In other words, in quantum 
theory, we have K = N 2 . In contrast, a classical n- level system is described by a probability distribution with n 
parameters (pi, . . . ,p n ). Thus, in classical probability theory, we have K = N. GPTs can have arbitrary relations 
between K and N; it can only be proven in general that K > N. 

GPTs also have a notion of composite systems. The quantum notions of subsystems and tensor products generalise 
to all GPTs under the standard assumption that signalling is not possible, i.e. that the reduced state on a subsystem 
is invariant under local operations on other subsystems |13j . This assumptions is implicit in the GPT framework, and 
also in this paper: one party cannot simply send information to another party by choosing local measurements. 
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II. OUR WORKING ASSUMPTIONS 

We are now interested in the class of theories for which the calculations of Page, Hayden and Preskill can be 
meaningfully generalised. In particular, we will be interested in the situation in Figure [2j which will be explained 
in more detail below. This setting involves a composition of four state spaces A = A\A<i, E, and C. In general, we 
can imagine that each system is described by an arbitrary GPT, with an arbitrary choice of a compact convex state 
space. However, it is clear that at least some requirements on the state spaces must be satisfied such that the setting 
makes physically sense - if the group of transformations Q a contained only the identity map, for example, then no 
interesting dynamics could happen whatsoever. 

We now describe the technical assumptions that we impose on our state spaces, together with their physical meaning. 
Our first assumption is called transitivity: for every pair of pure states in a common state space, there is a 
reversible transformation T such that Tip — lj. In our context, this is a very natural assumption: in order to study the 
black-hole information paradox, we only consider reversible time evolution. Moreover, we imagine that pure states are 
prepared by starting from a single reference state (like the vacuum state, for example) and applying some reversible 
time evolution. 

Our second assumption is of a technical nature: we assume that the group of reversible transformation Q a acts 
irreducibly on the state space; in fact, we assume that it is an irrep in the usual sense of group representation 
theory |32j . This assumption is not crucial - our calculations can be done for more complicated situations, but it 
keeps the calculations and results simple to start with. It is true for the group of unitary conjugations in quantum 
theory, and also for the state space of classical statistics, where Qa consists of the permutations of entries of the 
probability vector. 

Our next assumption comes from the physical requirement that state spaces contain "classical" subsystems. 
That is, on every state space A, there should be a set of perfectly distinguishable, pure states wi, . . . , u>n a that have 
all the properties of "classical" configurations: they can be permuted by reversible time evolution, and their uniform 
mixture is the state of "maximal ignorance", the maximally mixed state fi A . When we have a compound system AB, 
then its classical subsystem can be obtained from combining the classical subsystems of A and B. 

In quantum theory, a classical subsystem would be the states in some orthonormal basis. The existence of classical 
subsystems is empirically motivated: in some limit, or, say, after decoherence, systems behave very classically. In our 
setting, it makes sense to assume that the physically relevant GPT contains quantum theory as a subspace, which in 
turn contains classical subsystems as states in some orthonormal basis. 

Our final requirement is on how different state spaces A and B are combined into a joint state space AB. In GPTs, 
joint state spaces must satisfy some minimal requirements: if ui A and co B are states on A and B, there must always be 
a "product state" uj a oj b on AB, and similarly for effects and transformations. This already implies for the dimensions 
that Kab > KaKb- We now assume that Kab = KaKb - that is, that the number of degrees of freedom of the joint 
state space is in this sense "minimal" . In our setting, this is a very natural and almost mandatory assumption: if 
Kab > KaKb, then there are holistic degree of freedoms which are neither localised in the black hole nor outside of 
it. Information inside the black hole could then be transferred to these extra holistic degrees of freedom which could 
then only be accessed by joint measurements on the black hole and systems outside it. This is perhaps an interesting 
potential route to allowing information to be preserved as the system originally carrying it enters the black hole. 
However the question of how quickly information leaves the black hole is not well-defined or perhaps even relevant 
then and we shall therefore not discuss this case further here. 

The assumption that Kab — KaKb has an information-theoretic interpretation that is sometimes called local 
tomography: every state ui AB on AB is uniquely determined by the statistics and correlations of local measurements 
on A and B. In other words, to determine a global state u AB , it is sufficient to perform local measurements and 
subsequently analyze the correlations of local outcomes (obtained from many measurements on independent copies) . 
It can also be rephrased as the fact that the product states uj a uj b span the composite state space. By our definition 
of Ka, the unnormalized states are vectors in the real linear space La ■= M. Ka . So local tomography means that 
Lab = La <£) Lb - that is, unnormalized global states are carried by a vector space that is the tensor product of the 
local vector spaces. Again, this requirement holds for classical and quantum state space, and it is also true for most 
alternative GPTs that have been studied in the literature |13j . 

We call the above requirements the "standard assumptions"; see Definition [2] in the appendix for the full mathe- 
matical details. They imply also that Nab — NaNb- 

We additionally require that although the fundamental theory of nature may not be quantum theory, physics outside 
the black hole is very well described by quantum theory, and that semi-classical gravity also remains valid. So for 
example, systems inside the black hole and at Planck energy, can behave very differently to systems in quantum 
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theory, but the photons which escape the black hole should obey quantum theory. This is related to our "classical 
subsystems" -assumption above: states that describe systems far outside the black hole should lie in a quantum 
"subspace" of the more fundamental GPT, similarly as classical systems can be thought of as occupying a subspace 
of quantum theory (given by diagonal density matrices). 



III. THE PHYSICAL SETUP 



We now fix some notation and describe the physical situation. At some time t , Alice holds the state of system 

C: Charlie (reference system) 



B: Black hole 



iM: Alice's message 



A 2 : Black hole remainder 
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FIG. 2: Alice throws her message into the black hole. There are three parties: Alice's message (M), the earlier black hole (B), 
Bob (E) and the reference system (C). (The reference system is particularly natural to include in the case of quantum theory 
where one may always take C to be a purifying system such that tfj AEC is pure, but we shall not be assuming such purification 
is always possible.) Then a reversible interaction U is applied to Alice's system, representing the black hole dynamics acting 
on her diary after she has thrown it in. At some given subsequent time some of the black hole (Ai) has leaked out, e.g. via 
Hawking radiation, and is now in the possession of Bob, and relabelled as E2. Bob also holds any radiation predating Alice's 
message having entered the black hole (E), and he can perform a joint operation W on the system EA\ = EiE 2 . We also have 
BM = A 1 A 2 = A. 



M. We will assume that this state can be described by quantum theory, since it is outside the black hole. The state 
is entangled with an external referee C which we call Charlie. That is, there is a global entangled state ip MC which 
is held by Alice and Charlie, who are both outside of the black hole; to simplify the calculation we assume that it 
is pure. Essentially, we can interpret the correlation between M and C as meaning that system M has information 
about C. This will allow us to quantify what it means for information to be inside the black hole, or to escape from 
the black hole. 

Then, Alice throws her system which is in some (mixed) marginal state ip M into a black hole B. The black hole was 
formed in the far past in a pure state; since then, it has already potentially emitted Hawking radiation. The subsystem 
carrying all previously emitted Hawking radiation is denoted E. Alice does not take part in the rest of this thought 
experiment. We denote the joint system BM by A, and the total initial state at time to by ip — i\) CAE — ip CMBE . In 
what follows, unless otherwise indicated, all states and transformations are in the context of GPTs. 

We now consider some later time t > to, during which the black hole has been evaporating. Note that we use the 
notion of "time" merely as an illustration, and not as an ingredient in actual calculations: all that is important for our 
setup is the ordering in which different subsystems are held by different parties. For concreteness, we may imagine 
that t is the time measured by the outside observer Charlie. The quantity that is relevant for our final result turns 
out to be the number of emitted radiation quanta, log Na x ■ 

We assume that black hole evaporation is accomplished by some total reversible time evolution U . The input of U 
is the system inside the black hole, i.e. the original black hole B and the system M that Alice subsequently threw 
into it. That is, U is some reversible transformation acting on A = BM. Additionally, some of what was in the black 
hole is emitted as Hawking radiation. The system composed of all quanta that have been emitted between times to 
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and t is denoted by A\\ we denote by A2 the part which remains inside the black hole until time t. As state spaces, 
we have A = A1A2 = BM . The action of U can be understood as randomly selecting a subsystem A\ to be emitted, 
which is very similar to the setup in [33]. Denote the global state at time t by <j(U) (since it depends on U). That 
is, <j(U) cae — Ua® Ice(V ,C ' AB ) since the system CE is far from the black hole, and so remains unchanged (our 
calculations remain valid if these systems evolve reversibly locally in a way which does not create correlations with 
the other parties). 

Recall the state space E associated to the previously emitted Hawking radiation; in our thought experiment, we 
imagine that all the emitted quanta have been collected by an agent that we call Bob who resides far away from the 
black hole. At time t, this will also include all the radiation which has been emitted from the black hole up to this 
point, and thus we imagine that Bob has access to the joint system EA\. At this point, Bob may attempt to decode 
some of Alice's information by applying an arbitrary local transformation W] this way, he may hope to obtain (some 
of) the correlations with Charlie that were initially held by Alice. 

IV. THE DECOUPLING THEOREM FOR GPTS 

We arc now interested in knowing when the information that Alice has put into the black hole leaves it. That is, we 
would like to know when the system A2 which remains inside the black hole is decoupled from the reference system C, 
such that a(U) CA2 s» i\) C '<£> p A2 , where p A2 denotes the maximally mixed state on A%. In particular, we are interested 
in how much needs to be removed (i.e. how big A\ needs to be) until there are no correlations between what remains 
inside the black hole (A?) and the reference system C that A was originally correlated to. Intuitively, a statement 
like this is important in the context of the black hole information problem, because it can tell us how many photons 
typically escape the black hole, before information leaks out. This is because information is always information about 
something, and in this case, the correlation between A and C can be thought of as the information that A has about 
C. In other words, we think of C as being the source of the original information, and M is information about C . 
So, if initially A and C are fully correlated, and then after some time t, the part A2 which is inside the black hole 
is no longer correlated with C, then we know that all the information has left the black hole. In the quantum case, 
this also implies that the information is now outside the black hole, but we will see that this is not the case for more 
general theories. 

If the total system is quantum, the standard decoupling theorem says the following. For almost all evolution 
laws jH] this property holds: a(U) CA2 « ip c ® pA 2 , where p A2 = 1/g?a 2 denotes the maximally mixed state on A2, 
provided that the dimension d Al of the part A\ of A which is removed satisfies 

2 log d Al » log d A + log d c - log tr ((iP CA ) 2 ) . (1) 

That is, the remaining black hole A2 is almost uncorrelated with ("decoupled" from) C, provided that enough 
radiation has emerged from the black hole. In many cases, Equation ([!]) implies that 2\ogd Al ^> n ■ I(C : A), where 
I(C : A) := S(C) + S(A) — S(CA) is the mutual information, with S(X) := —tr(p x log px ) the von Neumann entropy. 

Furthermore, in the quantum case, the fact that information has left the black hole (i.e. the black hole has decoupled 
from C), implies that the information now is located outside the black hole, and can be reconstructed by Bob |27j . 
This is because in quantum theory, up to a unitary on the purifying system, there is only one pure state compatible 
with any density matrix on AC. So if the remaining state of the reference system and the black hole is decoupled, i.e. 
i\) C ® p A<2 , then there exists an isometry W acting on the purifying system E and taking it to systems E\ ~ M and 
E 2 such that the state on the entire system is (W E ® ld CA2 ) (a(U)) ip CEl <g> <j) E2A2 , where 4> CEl « tp CM , and <f> + 
denotes a maximally entangled state. That is, the correlations with C that were initially located in system M are now 
on Bob's system E\. It is this general principle (related to Uhlmann's theorem) which has been dubbed "no-hiding" 
in the context of black holes [30] . 

We now wish to explore how this result becomes modified if our 'quantum' theory of gravity involves a GPT other 
than quantum theory. 

Below we state a general probabilistic decoupling theorem, proven in the appendix. We need to introduce 
two notions before stating the result. Firstly, the 2-norm \\ ■ W2 is the usual Euclidean norm with the subtlety that the 
state space is represented such that all reversible transformations are orthogonal while all pure states are on the unit 
sphere, surrounding the maximally mixed state. The generalised purity V(ip) is defined as — where p is the 
maximally mixed state. This is 1 if and only if ip is pure (not a mixture of other states), and if and only if if) is the 
maximally mixed state [34j . See Table [i] in the appendix for what these notions are in the special cases of quantum 
theory and classical probability. 
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Theorem 1 (Decoupling in GPTs). Consider the situation depicted in Figure [1| with the notation cr(U) := {/a 
Ic-e(V0; where U € 5a is a random reversible transformation. If the standard assumptions hold, we have 



c/gSa {NcNa 2 -1){K a -1) {N c Na 2 ~ l)(K A - 1) 



Under an extra assumption on the subsystem CA2 (cf. Theorem \2!i\ in the supplementary material), we get 

MU) CA * -4> C ® » M tid\J< v(^ CA ) ■ N ^ (2) 



ueg A 



K Al 



if all involved N and K are large, where \\uj — Lp\\\ denotes the maximal difference in probability of any outcome of 
any possible measurement that can be applied to the states lp,uj. 

The theorem is proven in the technical supplement, which also contains the detailed definitions of the GPT frame- 
work and of our assumptions. 

V. IMPLICATIONS FOR HAYDEN-PRESKILL SCENARIO 

Similarly as Hayden and Preskill [23], we consider the situation that the black hole has been radiating for a very 
long time, such that it has become maximally entangled with its Hawking radiation. The state ip BE is maximally 
entangled at time to, by which we mean that the reduced state ip B is close to the maximally mixed state, ip B = fj, . 
The state tp CA which appears in the theorem above is then 

with ijj CM pure. According to Theorem 28 in [34], its purity is 



cm N C N M -1 _ 1 N M 



N C N M N B - 1 N B N A 



where the approximation is true if all N's are large enough such that subtracting unity can be neglected. Accord- 
ing to Theorem [l] this gives a criterion for when the state a(U) CA2 is operationally (i.e. in || • ||i-norm) almost 
indistinguishable from the uncorrelated state tp c ® /x^ 2 : 

a{U) CA ^^ c ®^ if V(yj CA )- N ^<$:l K Al ^N MC . 

Ail 

In other words, the black hole must have radiated away "enough" Hawking radiation in Ai , with a lower bound given 
by the size of the system shared between Alice (M) and Charlie (C) . To simplify the discussion, we may assume that 
Alice and Charlie carry the same types of systems, i.e. Nm = Nc, such that Nmc — ^m- 

Let us first consider the case of quantum theory. In this case, we have K Al = N Ai , and so the decoupling 
condition becomes N Al 3> Nm- If Alice's state consists of k qubits, we have Nm = 2 fe , and the condition is ensured if 
k + c qubits have been radiated away in A\ (with some small constant c), because then N Al = 2 k+c Nm- We have 
thus recovered the result by Hayden and Preskill: if the black hole has radiated away enough of its degrees of freedom 
such that it is maximally entangled with the Hawking radiation, then it acts as a "mirror", effectively "bouncing 
back" in just a few additional qubits any quantum information that is thrown in. 

Theorem [l] and the discussion so far have not touched on the question how the outside agent Bob can actually 
obtain the information that the black hole has radiated away. As discussed above, in quantum theory, Bob may 
recover the information by applying a suitable unitary W on his system EA\. This is guaranteed by the "no-hiding 
theorem" or Uhlmann's theorem. 

Now we consider the situation beyond quantum theory. As explained above, any GPT appearing in the context 
of known physics must contain quantum theory's state space as a subspace - we expect that it behaves like quantum 
theory in non-extreme situations (like outside the black hole). Thus, there must be enough degrees of freedom K such 
that quantum theory with its K = N 2 degrees of freedom is contained; i.e. K ^> N 2 . 

In order to get a more concrete picture of the decoupling situation, we will assume that there is a functional relation 
between N and K of the form K = N r , where the Wootters-Hardy parameter r > 3 is an integer (we have r = 2 in 
quantum theory, and r = 1 in classical statistics). In fact, we show in Lemma|6]in the appendix that this relationship 



follows from a few simple additional assumptions on top of our standard assumptions. Thus, the decoupling condition 
becomes N Ai >• N%f , or for the number of generalized radiated bits log Na 1 , 

2 

a (rj\CA 2 _ q A 2 if log ATa, > - logA^M (for N M = N c and ip B maximally mixed). 

r 

This shows that for theories beyond quantum theory, information can leave a black hole even faster than in the 
quantum case. Surprisingly, if r > 2, then this inequality may be satisfied even if the number of radiated bits log Na 1 
is less than the number of generalized bits log Nm that Alice has thrown into the black hole. That is, the black hole 
gets decoupled from Charlie even before it had any chance to output all information that Alice has put in. 

How is this possible? The only conceivable explanation seems that the no-hiding theorem of quantum theory loses 
its validity: even if the black hole gets decoupled from Charlie, the outside agent Bob is still not able to obtain the 
correlations with Charlie by applying any reversible transformation W. Otherwise, Alice's k generalized bits would 
somehow have ended up at Bob's place by transmitting 2k/r, i.e. much less than k, bits. 

We can interpret this finding by saying that for theories with r > 2, information leaves the black hole more quickly 
than in the quantum case, but may not become accessible outside the black hole. We can also obtain a bound on 
the time when the information starts to appear outside. Suppose that a(U) CEAl « ip GE ® fi Al . This means that 
adding the ^-system to the full outside world CE increases the entropy of the outside system; it only adds noise. 
This means that the emitted system A\ will not carry information to the outside world. Again, we can use eq. ^ 
to obtain a bound on when this happens: checking that this formula remains valid for the CE- versus- Ai cut, this 
situation happens if NqeNa *C Ka 2 - Assuming that the black hole has previously radiated away half of its degrees 
of freedom, i.e. Ne = Nb, we obtain 

a(U) CEAl ps *P CE ® n Al if log N Al < log N A (for N M = N c and N E = N B ). 

r 

That is, at least (r — 2)/r log(AOi) generalized bits have to be emitted before information starts to appear outside. In 
quantum theory (r = 2), this is a trivial bound: information starts to appear directly, but not so for theories with 
r > 3. Thus in this scenario, and for sufficiently large r, although information escapes from the black hole even more 
quickly, it takes even longer for the information to appear outside the black hole. 

In order to analyze the failure of the no-hiding theorem beyond quantum theory, we now consider the simplified 
situation where the system E does not exist, or, in other words, is one-dimensional: Ne = Kb = 1. Physically, 
this means that Alice throws her system into a very young black hole B that has not radiated so far (we may also 
imagine that she forms the black hole A = MB at time to from her state M and some other massive stuff B). Then 
the situation becomes fully symmetric with respect to interchanging A\ and A 2 . Thus, in analogy to above, we can 
compute bounds on the number of radiated bits which guarantee decoupling of subsystems. We consider two scenarios: 

• Black hole is decoupled from Charlie. This indicates that the information has left the black hole. This is 
the case if a(U) CA2 « i/j c ® n M , which holds due to eq. Q if N C N A < K Al , that is, if 

log A^4i 3> — log Na + — log Nc- (3) 
r r 

(Here we are not assuming that Nm and Nc are necessarily identical, in contrast to the beginning of this section. 
We are also not assuming that i/j b is mixed, taking into account that the black hole may not have radiated 
previously, and may thus still be pure.) 

• Radiation is decoupled from Charlie. This indicates that no information has arrived outside the black 
hole, since the emitted radiation is completely uncorrelated with the original source of information C. This is 
the case if a(U) CAl ip c ® fi Al . Swapping A 1 and A 2 in eq. we see that this holds if N C N A < K A2 , i.e. 

log N Al < log Na — - log N c . (4) 

r r 

In the following discussion we assume that Na ^> Nc, i.e. that the black hole is much larger than Alice's system. 

In the case of quantum theory (i.e. r = 2), we get the behaviour depicted in Figure [3j which is somehow what we 
expect. Shown is the number of generalized bits (in quantum theory, qubits) logAOii that have been radiated away 
since Alice threw her system into the black hole. 

In the post-quantum case of r > 3, the behaviour becomes surprising: we get a time interval in which both the 
black hole and the radiation are decoupled from Charlie (cf. Fig. [4]). Thus, there cannot be an analog of quantum 
theory's no-hiding theorem: decoupling of one system does not guarantee that the information can be extracted on 
the remaining system. 



Decoupling when K = N 2 (Quantum case) 
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Radiation 
decoupled from 
Charlie 



h 



Emission of 

ft log N c 
qubits 



r 

Uog N A -± log N c 



Black hole 
decoupled from 
Charlie 



logiV Al 



■k log N A + h log N c 



FIG. 3: In quantum theory, the radiation is initially decoupled from Charlie (information is not in the radiation). Around the 
point where half of the black hole's qubits have been radiated away, this interval ends, and emission of a bit more than log iVc 
qubits leads to decoupling of the black hole and Charlie (information not in black hole). Note that the horizontal axis is the 
number of emitted photons rather than the time coordinate. 



Both decoupled for K = N r , r > 3 



I 1 ► 

Black hole decoupled from Charlie 



Both decoupled 



Radiation decoupled from Charlie 



logiV Al 



±logiV A + I log N c (Case of r = 3) § log N A - | log N c 



FIG. 4: Beyond quantum theory, there is a time interval when both the black hole and the radiation are decoupled from 
Charlie (information is neither in the black hole nor in the radiation) . This shows that there is no analog of quantum theory's 
"no- hiding theorem" . 



VI. IMPLICATIONS FOR PAGE'S SCENARIO 



We will now consider a further simplification of our scenario, and assume that the system C does not exist, 
i.e. Nc = Kc = 1. That is, the black hole is formed from a pure state tp A = ifj MB at time t , and then gradually 
radiates away. This will allow us to analyse Page's [3S] scenario in the context of GPTs. Assuming that the black 
hole implements a random reversible transformation (as we do), Page computed the expected entropy of Hawking 
radiation, i.e. the entanglement entropy of the bipartite state, in terms of the number of emitted quanta. 

While there is no unique generalization of Shannon or von Neumann entropy to GPTs 36-38], we can employ 
our results by instead considering the Renyi entropy of order 2. For a quantum state p, this is defined as H 2 {p) := 
— logtr(p 2 ), which is zero for pure states and has the maximal value logn for the maximally mixed state on C". 
The expression tr(p 2 ) equals the purity T'(p) up to some offset and factor. Taking these into account motivates the 
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definition 



H 2 (u A ) := - log 



1 

~N~A 



N A -l 
N a 



for arbitrary state spaces A that satisfy our standard assumptions. This agrees with Renyi 2-entropy in quantum 
theory by construction; moreover, it turns out to agree with the classical version of this quantity in classical probability 
theory. We have < H 2 (uj ) < log N A for all state spaces; the minimum is attained iff lj a is pure, and the maximum 
is attained iff u A = fi A . 

According to Theorem [T] we have 

K Ax -\ Na — 1 

\\U (U ) ' — fjj - ||„ LIU = 

iueg A 



f V {cj{U) m ) dU = [ \\i{U) Al -fi Al \ 



* dU ~N Al -l K A -1 



As usually, in high dimensions, we have a concentration of measure effect, such that in fact for "almost all" U £ Q a-, 
we expect 



r(a(uy 



N 



At 



1 N A -1 



N Al -l 



N A 



1 



H 2 (<t(UY 



logN Al - log 1 



(N A± - 1)(N A - 1) 



This gives us an estimate of the Hawking radiation's Renyi-2 entropy H 2 {Ai), depending on the number of emitted 
degrees of freedom N Al , or emitted generalised bits logN Al . Figure [5] shows plots of H 2 (A\) over the number of 
emitted bits \ogN Al for different r's. For the quantum case, i.e. r = 2, we recover Page's result: the entropy grows 
until half of the black hole's qubits have been emitted, and then decreases again. In the post-quantum realm, we see 
that the entropy keeps growing for a longer period. In other words, a "smaller" post-quantum black hole can purify 
a "larger" amount of outgoing radiation. It reveals information later in its lifetime than in the quantum case. 

Interestingly, the different theories all behave very similarly for small times, but differ strongly among each other 
(and for r ^ 2 from quantum theory) towards the end of the black hole's lifetime, when quantum gravity effects are 
expected to dominate. This implies that in theories which are more general than quantum theory, one can respect the 




log N Al 



FIG. 5: Page's scenario generalised to GPTs. Close to the end of the black hole's lifetime, the Wootters-Hardy parameter r 
associated with a theory has significant impact on the rate at which information leaves the hole. The full line is the standard 
quantum case with r = 2, reproducing Page's curve [20]; the dashed line corresponds to r = 3, and the dotted line to r = 4. 



semi-classical result of Hawking (that outgoing radiation appears thermal) for much longer times. And for sufficiently 
large values of r, the black hole can behave semi-classically until the Planck scale, when we anyway expect quantum 
gravitational effects to occur. This gives some hope that one can resolve the black hole information problem within the 
context of theories which preserve information, yet still reproduce low energy phenomena such as Hawking radiation. 
Such theories have interesting properties - the black hole emits thermal radiation through most of its lifetime, until 
it approaches Planck size. At this point, a small object is purifying a very large system (the emitted radiation). 

In quantum theory, such a small object would need to have a huge amount of entropy, violating the conjectured 
entropy bound of Bekenstein, and suffering from several issues associated with long-lived remnants. For example, 
objects with high entropy take a long time to release their information, meaning that the black hole would take a 
large amount of time to finish evaporating [THUTlj. Highly entropic objects are also expected to couple very strongly 
to all other interactions, as they have a phase space factor which is proportional to the number of degrees of freedom 
N they possess [17] , 
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These problems need not plague the final stages of black hole evaporation in these generalised theories. At the end 
of its lifetime, the black hole will have large K, so that it is described by a large number of parameters, but almost 
all of these states are physically indistinguishable (it has a small value of N), and so its physical entropy is very 
small 39 . In this sense, it would not violate entropy bounds, and the final few bits of information could come out at 
a reasonably fast rate. It is unclear whether the black-hole decay rate would have a phase space factor proportional 
to N or to K, a question which has not been considered, given that N and K are so closely related in quantum and 
classical theory. If it were proportional to N rather than K, it would mean that Planck sized black holes which purify 
large systems need not couple strongly to other forms of matter, evading one of the main objections to such objects. 

VII. CONCLUDING REMARKS 

Given the difficulty encountered when trying to apply standard quantum theory to gravity, it is reasonable to 
assume that a theory which consistently combines quantum theory and gravity will have to modify quantum theory 
in some way. Given the central importance of black holes in guiding our research in finding such a theory, it is natural 
to explore how our understanding of black holes changes if we modify quantum theory. The goal of this work was to 
obtain a glimpse on possible new effects and modified information-theoretic behaviour that might appear in the more 
general situation beyond quantum theory. 

Here, we have seen that several important aspect of the black information problem are modified in the more general 
setting. In particular, if the fundamental theory of nature preserves information, than the point at which information 
is likely to escape a black hole is theory dependent. For a broad and general class of generalisations of quantum theory, 
we have seen that the point at which information escapes a black hole is given by the Wootters-Hardy parameter r, a 
parameter which relates the degrees of freedom needed to describe a state with the degrees of freedom which can be 
distinguished in a measurement. Quantum theory corresponds to r — 2, and the case where information must begin 
to escape the black hole when half the Hawking photons have escaped. This makes it difficult to reconcile unitarity 
with the apparently semi-classical nature of black hole radiation. However, by increasing the parameter, one can 
delay the point at which information generically escapes the black hole, even to the point where it only escapes in the 
final burst of radiation, when the black hole is no longer semi-classical, and we expect quantum gravitational effects 
to come into play. 

We have also seen that contrary to the quantum case, information can escape the black hole, but not appear 
outside of it. There can be a period of time when the information is delocalised, providing an alternative to black 
hole complementarity. This is particularly relevant when the black hole is initially entangled with the outside world, 
and information thrown in at this point would generically escape very quickly. 

Our result however comes with important caveats. Here, we have only considered modifying the theory which 
governs matter, and have not considered changes to the space-time structure. A full understanding of the black hole 
information problem will presumably require a better understanding of what the quantum theory of gravity will look 
like. We also do not know how viable the various generalisations of quantum theory are; explicitly constructing and 
classifying these theories is subject of current research. So, although we may consider theories with larger values of 
r, it may be that these are ruled out by other considerations. Constructing such theories, and understanding them 
better, remains an important task. 

Finally, it is worth pointing out that our generalised decoupling result gives an amusing insight into quantum 
information theory. It has long been a source of discussion as to why quantum theory differs from classical information 
theory, often by a factor of 1/2. For example, super-dense coding |40j allows us to send twice as much information 
as in the classical case. Or for the decoupling theorem in the quantum case, we have logc^ ^> \ ■ n ■ I(C : A). Here 
we see that the 1/2 in this expression comes from the Wotters-Hardy parameter r. It is because in quantum theory, 
log N bits are carried by systems with 2 log K parameters and in more general theories, r log K. 
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General state space A 


iV-level QT 


iV-level CPT 


real vector space A 


space of N x N Hermitian matrices 


R N 


Na =max. # of perfectly dist. states 


Hilbert space dim. Na = N = dimH 


N A = N = # of levels 


Ka = dim .A, number of parameters 
to specify unnormalized state 


Ka = Na, number of real parameters 
in an unnormalized density matrix 


Ka = Na, number of probabilities 
to specify the distribution 


A+ set of unnormalized states 


positive semidefinite matrices 


vectors with non-negative entries 


Q,a set of normalized states cj 


set of density matrices p 


set of probability vectors p 


u A unit effect 


« A (p)=tr(p) 


u A (p) = pi + P2 + ■ ■ ■ + Pn 


/i A maximally mixed state 


u A = 1/N A 


H A = (1/Na,-..,1/Na) 


Bloch vector £J £ A 


p = p- 1/Na 


p = (pi- 1/Na, ■ . . ,Pn a - 1/Na) 


invariant inner product (&,<p) 




(pA) - Na ±! EiP^i 


purity V(ui) 


T>/ \ N A j_ / 2 \ 1 

v{p) = NA tM P ) Na-1 


T,l \ N A 2 1 
P(P)= NA tlJ2iPi N A -1 


group of reversible transf. Qa 


projective unitary group, p h-> UpU' 


group of permutations 5*jv 


centered class, subsystem u>i,... ,ujn a 


ONB |^i)(^i|,...,|^>(^ A | 


ui = (1, 0,. ..),..., un a = (.-.,0,1) 


\\lu — ip\\2 for normalized states oj, tp 


11^ fh - \J Jl-i V tr (^ <^) 2 


Ik - \J Na -i ■ \ZY,Mi 'Pi) 2 


\\ui — ip\ i for normalized states lj, ip 


matrix 1-norm (2x trace distance) 


variational distance J]\ \u)i — 



TABLE I: Our various GPT notions in the special cases of quantum theory (QT) and classical probability theory (CPT). 



Appendix A: Technical Supplement: Decoupling Theorem for probabilistic theories 

1. Setup and notation 

For the main definitions of the GPT framework (together with their physical interpretation), we refer the reader 
to [33]. We recommend [5] H3J [T3] as background for readers unfamiliar with GPTs. We use the definitions and 
notation as they are introduced in [33], and also some of the results presented there. The GPT framework contains 
quantum theory (QT) and classical probability theory (CPT) as special cases. Table [I] gives an (incomplete) overview. 

The quantum decoupling theorem which we shall be generalising is for example described in 28J. The notation is 
described in Figure [6] 

2. Proof of the GPT decoupling theorem 

When considering more general theories we shall be making certain assumptions which we now state more formally. 
Most of these assumptions are necessary if one wants to probe the issue of information escape in black holes, and 
their justification is discussed in the main body of this paper. 

Definition 2. We say that a collection of state spaces satisfies the standard assumptions if the following conditions 
are all satisfied: 

• Transitivity: For every pair of pure states (p,ui on a state space A, there is a reversible transformation G £ Qa 
with Gtp = lu. 

• Local tomography: If A and B are state spaces, then the joint state space AB has the property that global 
states are uniquely determined by the statistics and correlations of local measurement outcomes. 

• Centered classical subsystems: Every state space A contains at least one dynamical centered classical sub- 
system lji, . . . , ujn a — that is, a set of pure and perfectly distinguishable states uj\, . . . , ojn a that average to the 
maximally mixed state, 

-, N A 
A \ ^ 

M = ^^> 

l — l 
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E: All previous radiation 
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information 
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FIG. 6: (This figure is also in the main body of the text but is reproduced here in order to make the technical supplement self- 
contained). Alice throws her message into the black hole. There are three parties: Alice's message (M), the earlier black hole 
(B), Bob (E) and the reference system (C). (The reference system is particularly natural to include in the case of quantum theory 
where one may always take C to be a purifying system such that %fj AEC is pure, but we shall not be assuming such purification 
is always possible.) Then a reversible interaction U is applied to Alice's system, representing the black hole dynamics acting 
on her diary after she has thrown it in. At some given subsequent time some of the black hole (Ai) has leaked out, e.g. via 
Hawking radiation, and is now in the possession of Bob, and relabeled as E 2 . Bob also holds any radiation predating Alice's 
message having entered the black hole (E), and he can perform a joint operation W on the system EAi = E\E 2 . We also have 
BM = A X A 2 = A. 



such that every permutation on this set of states can be accomplished by some reversible transformation. More- 
over, we assume that composite state spaces AB contain at least one dynamical centered classical subsystem 
which is the product of two such classical subsystems on A and B. 

• Irreducibility: On every state space A ( also the composite ones ), the group of reversible transformations is an 
irrep in the usual sense 130$ . i.e. acts irreducibly on the complexification of A := {x 6 A \ u A (x) = 0}. (Note 
that this is a bit stronger than the irreducibility in 1341), which merely demands that Qa does not leave any 
non-trivial subspaces of the real vector space A invariant). 

We shall use certain definitions from |34| which generalise key quantum quantities such as purity. We include them 
here for completeness-for more details such as the operational interpretation of purity see [34] . 

Definition 3 (Maximally mixed state). If A is a transitive dynamical state space, let ui € Qa be an arbitrary pure 
state, and define the maximally mixed state \i A on A by 

[i A := [ G{lo) dG. 
J Geo A 

The Haar measure on the evolution group G exists because G is compact [32] . Note that it follows from transitivity 
that \x A does not depend on the choice of u>. 

Definition 4 (Bloch vector). Given any state ui G Qa (or, more generally, any point 10 £ A with u a (uj) = 1), we 
define its corresponding Bloch vector u as 



A 

oj := lu — /j* . 

Definition 5 (Purity). Let A be a transitive and irreducible state space, and let (•, •) be the unique inner product on 
A such that all transformations are orthogonal and (a, a) — 1 for pure states a. Then, the purity 'P(w) of any state 
u) G Qa is defined as the squared length of the corresponding Bloch vector, i.e. 

V{uj) := ||w|| 2 = 
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We have V — 1 for pure states and V = for the maximally mixed state, cf. [34 . Now we prove a claim made in 
the main text. 

Lemma 6. Suppose that we have a family T of state spaces, such that for every n G N, there is exactly one state 
space A{n) G T such that the maximal number of perfectly distinguishable states is Na(ji) — n - Moreover, suppose that 
every pair of state spaces A, B G J- has a composite AB G J- , and that m < n implies for the state space dimensions 
^A(m) < KaOti) ■ V a M state spaces in T and their compositions satisfy our standard assumptions, then there is an 
integer r G N such that for every state space A G T , 

K A = N r A . 

Proof. As we have shown, our standard assumptions imply that Kab — KaKb and Nab — NaNb- Furthermore, 
the assumptions of the lemma imply that K is a strictly increasing function of N . Thus, it follows from !9, Appendix 
2] that there is some integer r G N such that K = N r . □ 
Any state space A can be decomposed into a direct sum of a (Ka — l)-dimensional subspace A, containing all 
vectors x with u (x) — 0, and the one-dimensional subspace R • /i A , the span of the maximally mixed state [i A . So 
far, we have introduced an invariant inner product on A. It will be convenient for the following calculation to extend 
this inner product to all of A in a particular way: 

Definition 7. If an irreducible state space A carries a centered classical subsystem, then we define an inner product 
on A by 

(x,y) := (x,y) + ^° V ° - , 

where x and y are decomposed as x = x + xq(J- A , y — y + y$^ A , such that x, y G A. 

The choice of the denominator (Na — 1) seems arbitrary at first sight, but it turns out to simplify calculations a 
lot. This is due to the following lemma: 

Lemma 8. Suppose that {A, B,AB} satisfy the standard assumptions. Then, for arbitrary vectors a A ,~f A G A and 
j3 B ,S B G B, we have 

(a A ^ A ® S B ) = ^-^^h ^W^S*). (Al) 

In other words, 

wi= ( %^rV i*tf-i. 

Moreover, the unit effect u B on B is (u B \ = (Nb — that is, 

u B (x) = (Nb - l)(^ B \x) for all x G B. 

Proof. Recall the decomposition of AB = A<Ei B into the four subspaces 

AB = (A®B)®(A® ii B ) © (fi A ®fl)0 Rfi AB . 

We know that these four subspaces are mutually orthogonal. Therefore, writing any vector a A G A as a A — a A +oioiA A , 
where a A G A (and similarly for the other vectors), we get 

(a A ®l3 B \ 7 A (g>6 B ) = (a A ® f3 B + f3 a A ® fi B + a oi i A ® /3 s + a ^ AB \j A ® S B + 6 J A ® fi B + 7oM A ® 5 B + j S Q ^ AB ) 
= (a A ® /3 S |7 A (» ,5 s ) + f3 5 {a A ® fi B \"f A ® M s ) + a„7o(/* A ® /^V ® <5 S > + a /?o7o<W AB |/0- 

According to |34) . we know the values of all inner products except for the first one: we have (a A ® n B Yy A ® M B ) — 
1 i (« A |7 A ) and O^®/^®* 3 ) = /*" 1 . B ]^). By definition, </^V B ) = —I y The 

1\A^B — 1 N^JSb - 1 ^A^B — 1 

only unknown term is the first one. Now we argue by group theory: if Q a and Qb act complcx-irrcducibly on A and 
-B, respectively, then Q a ® Qb acts complex-irreducibly on A ® B. Hence the invariant inner product is of the form 
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(a A ® $ B \j A ® 5 s ) = ^(d" 4 |7 A )(/3 B |5 B ), with some constant £ > 0. We can determine this constant by considering 
the special case where a A and f3 B are pure states, and r y A — a A as well as S B = (3 B , then 

(a A ®p B \ 1 A ®5 B ) = {{a A ® P B ) A + n AB \(a A ® /3 B ) A + /^ B ) 
= 7>(a A ®/3 B ) + — — r - = 1 ' 



A^ B - 1 JVa^b - 1 

Comparing this with the formula above gives £ = - — A — — — — — -. Substituting £, we may compare the formula 

NaN b - 1 

above with the right-hand side of (Al), and see that they agree. The second part of the lemma is proven by the 
calculation 

u B (x) = u B (x) + x u b (ll b ) = x Q = (N B - 1) ((^ B \x) + x (n B \v B )) = (N B - l)(fi B \x), 

where have again used the decomposition x = x + x a [i B with x £ B. □ 
Similar reasoning proves the following lemma: 

Lemma 9. If X A : A — > K is a linear functional with X A (p 1 A ) = 0, then there is a vector X A G A such that 
(X A , v) = X A (v) for all v € A. This vector satisfies 

(X a ®u b \ = (N b -1)(X a \®(ll b \ and \X A ® u B ) = NaNb ~ 1 \X A ) ® \^ B ). 

IS A — 1 

Twirling over one subsystem produces the maximal mixture on that subsystem. In more detail, we have the 
following result: 

Lemma 10. Suppose that B is a transitive state space. Then, for all bipartite states uj ab , 



(l®T)(uj AB )dT = uj a ® fj, 1 
TeQ B 



where co A is the local reduced state on A. 



Proof. There are pure states <p A spanning A, and pure states ip B spanning B. Hence their products ip A ® <p B span 
AB, and ui AB can be written uj ab = cgjvt ® ff with some real and not necessarily non- negative coefficients 

ctij e E and a ij = !• Hence 

f (I®T)(uj AB )dT = V)ay [ (I®T){ V A ®y B )dT = ^^8 f T^fdT 

ij 

But a ij i pt is the local reduced state uj a . □ 
In the following, we need a formula giving the inner product between an arbitrary bipartite state, and a product 
state which is maximally mixed on B: 

Lemma 11. Suppose that {A, B, AB} satisfy the standard assumptions. Then, for every bipartite state a AB , 

(a AB ,(u A ® f , B r) = (a A ,u, A ). Na ~\ , 

where a A is the local reduced state on A. 
Proof. We use twirling and Lemma [l0{ 

(a AB ,{u A ®fj, B ) A ) = {a AB ,(^J (l®T){to A ®^ B )dT^ ) = (a AB , J (I ® T)(uj a ® fi B ) A dT) 

(a AB ,(I<E)T)(uj A <E,fi B ))dT= f {{l®T- 1 )a AB ,u J A ®pi B )dT 

JT 

= (a A ® fx B ,Cj a ® v?) = ((a A ®fi B ) A ,(uj A ®ti B ) A ). 
The claim follows. □ 
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Lemma 12. Suppose that CA1A2E and all its subsystems satisfy the standard assumptions. Then 

MT) CA 2 _ g, M A 2) A||2 = V{(J{T) CA 2) _ Nc~l C) 

iVciVA 2 - 1 

/or every reversible transformation T 6 C^. 

Proof. We use all the previous lemmas and start by multiplying out the 2-norm: 

\\d{T) CA i oa c ® = = ||a(T) c ^||2 - 2 («T(r) c ^, (</> c ® m A2 ) a > + II0A C ® /i A2 ) A ll^ 

J^C-^Az - 1 

Finally, use that o-(T) c = V> c as well as Theorem 28 from [32]. □ 
The next lemma is a straightforward application of Schur's Lemma. The simple proof is omitted. 

Lemma 13. Suppose that A is transitive and complex-irreducible. Denote by Is the orthogonal projector onto A 
(which is also the identity map on A), and by I„a = (N A — lOI/^K/^l the orthogonal projector onto the span of \i A . 
Then, for any matrix M of compatible dimensions, 



GEGa K A 



GMG dG = 



^I A + Tr(VMV)V 



= Ka _^ 1 a + (Na - 1)(^|M|^)V. 

Note that the condition of complex-irreducibility is important here; in the case of real-irreducibility only, the claim 
of the lemma would only necessarily be true if M > 0. Here is an example for how we will use that lemma in the 
following: if uj and tp are states on A, then 

G\u){v\G-UG=P^-l A + \n A ){A- 

G€Q A 

Lemma 14. If CA1A2E and all its subsystems satisfy the standard assumptions, then 



I := / {l c ®T A )\ii CA )^ CA \{l c ®T^)dT A 
JT A eg A 

= (Tr A H CA )(^ A \) ® + n N c C Na 1 _ 1 \^ C \ ® (ip 



K A -l 



Note that there does not seem to be a simple relation between Tt A \ip GA ) (ip CA \ and \ip c }(ip c \ in general (the analog 
of taking the partial trace in quantum theory would instead be Idc <8> u A (ip CA ) — ip c ' ). 

Proof. There are real numbers Wij, summing to one, and pure states (f>f G Qc an d 4>f S ^A such that 

Kc K A 

w = ^2 ^2 Wij<f>? ® Then, I = J^zjki w ijWklIijM, with 
i=i 3=1 

kjkl = [ (l c ®T A )\tf (gxjif)^ ®<t>f\(I c ®T A - 1 )dT A 
JT A ag A 



{l c ®T A )\tf) ® |^)<^| ® (^f I (lo ®T^) dT A ■ {N ° N ^ (JVa i 1} 



N C N A - 1 



{Nc ~f NA ~ 1] \<l>?M\^ [ T A \4> A )(4> A \T-/ dT A 
N c Ma - 1 Jr A €G A 
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Then, substitute the identity (<pf\(f> A ) — (4>f\(f>f) and multiply out. Put in the sum over ijkl again. The 

N A ~ 1 

resulting terms can be identified as follows: From 

o-i; 

N C N A 



it follows that 

Tr^^K^I - ^Vj^-V - E"tf««<^l^)l^)<^l» ( A2 ) 



and 



\ip c )d> c \ = 



ij I \ kl 



ijkl 



This proves the claim. □ 
Using all the previous lemmas, we can express the desired integral over the purity in the following way: 

Lemma 15. Suppose that CA\A^E and all its subsystems satisfy the standard assumptions. If X CA2 is any Pauli 
map on CA^, then 

[ V{cj{T) CA *)dT=(K c K A2 -l){X CM ®u M \J\X CA *®u M ), (A3) 

JTeGa 

where 

J:= I (GcsIaJIIG- 1 ®I Al )dG, (A4) 

JGeQcA 2 

and I is the expression from Lemma \Tl\ 

Proof. As shown in [34] . purity can be expressed as a twirling integral: 

V(a(T) CA *) = (K C K A2 - 1) f ({X CA - o G)(a(T) CA *)) 2 dG, 

•> GgGc a 2 

if X CA - is any Pauli map on CAi- Substituting 

X CA2 o G(a(T) CA2 ) = X CA2 (g>u Al (G(g>l Al (a(T) CA )) = {X CM ® u Al \G ®l Al \cr{T) CA ) 

and a(T) CA = Ic <8> T A ip CA and then reversing the order of integration proves the claim. □ 
Due to Schur's Lemma, evaluating twirling integrals amounts to projecting onto invariant subspaces. Therefore, we 
need the following lemma that states some of the projection identities that will be used in this procedure. 

Lemma 16. Under the standard assumptions, we have the following projection identities: 

E M CA 2 (|^ C )(^ C |Ol A2 ) IpOA, 



Nc-1' 



\ca 2 (|v c )<v> c l ® v») 



,CA 2 



i [C A 2 r(\iP c )^ c \®iA 2 )i ( cA 2 r = \^ c )^ c \®i A2 + \^ c ){^ c \®i^ 2 

I(CA 2 )A (|V C )^ C |®V 2 )I (CA2 )A = |V> C )(^ C |®V 2 



'2 ■ 

\ca 2 (Tr A |^ CA )(V CA |®lA 2 ) \ca 2 = _^_1_(^|^ )Va2j 



l^CA 2 (^1^(^101^)1^ = N Jf ^ 1^ A )\0A 2 , 

I {C a 2 )*{^a\^ CA )^ CA \®Ia 2 )1 ( ca 2) * = (Tr A |^)(^ CA |)®I l2 

+ (Tr^|^)(^|)®V 2 , 

I {CA2r (Tr A ^ CA )^ CA \®l^ 2 )l {CA2r = (Tr A |^)<^ CA |)®V- 
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Proof. To prove the first four identities, multiply out the expression in between the two projection operators, using 
ij) = t]j c + pF and I A2 — 1^ + I MA , and check for each addend the corresponding action on the subspace structure. 



For the remaining identities, note that due to (A2) we can write 



(N C - l)(N A - 1) ^ ,,Cw,C| 



A C A A 



(A5) 



where = mjWkl(<t>f\<f>f)- Hence 



ik ijkl 



E^(^U (E^/> I = {i> A \i> A ) = {4> A \$ A ) + {» A \» A ) = ni> A ) + 

\ kl I 



1 



N A -1 



To compute the remaining four identities, treat the \4'f)( ( t ) k \ appearing in (A5) exactly as \ip \ m the first four 



equations. 

Now we are finally ready to prove a first version of the decoupling theorem: 
Lemma 17. If the standard assumptions are satisfied, then 



□ 



V (a(T) CA2 ) dT = V{^ A )- {KA ^ 1){N f A - l \ + n^) ^o-l){K A -K M ) 



reg A 



(N C N A2 - 1)(K A - 1) 



(N C N A2 - 1){K A - 1) 



Proof. We use the identities and notation from Lemma 15, and start by computing J using (A4). First, we expand / 
into a sum of tensor product expressions: 



(Tr A H CA )(^ CA \) ® ® I A2 - (Tr A |^ CA )(V> CA |) 



K A 



K A -1 



Nr — 1 ^ ^ 



A c A A - 1 ' 



Then, we use Lemma[l6]and Lemma[l3]to twirl over that expression and get J. This involves lengthy, but straightfor- 
ward algebra und the identity Tr (Tr A \%p CA )(ip CA \^ — (ip CA \ip CA ) . We first get the following expression for J (some 
addends are marked for later reference): 



/ 



J = 



K A -l 



\ 



(^ A \^ A )(K A2 -1) + (^ A \^ A ) N A — 1 A A 
K C K A 1 ^ CA ^ + N C N A -I {qp ^ 



V 



(*) 



/ 



K A -l 



A, -1 



(*) 



- I, 
7V C A A - 1 M 



A c - 1 l Al 
' N C N A -1K A -1 



N c - 1 Vi 



AcA^, - 1 

/,C|„/,C 



A c - 1 
(*) / 



(^'|^')(A A2 -1) + (^|^) 
^c#a 2 - 1 



HCA 2 ) 



V 



Ac-1 
(*) / 



^ c \^ c ) 
K c K Ad - 



HcA 2 y 



Nn-1 
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Recall identity (A3). In this identity, we are free to choose an arbitrary Pauli map X CA2 on CA2- A useful choi ce is 

X CA2 := J — X c <g u A2 , where X c is any Pauli map on C. The expression X CA2 <g u Al appearing in (A3) 

y N c Na 2 — 1 



becomes X c ® u A . In the usual decomposition of CA into four invariant orthogonal subspaces, this functional is 
zero on all subspaces except for the subspace C (g) \i A . In particular, this functional gives zero if it is evaluated on 
all addends that are marked with a (*) above. Therefore, we have (X c <g u A \ J\X C ® u A ) = (X c ® u A \ J'\X C (g) u A ), 
where J' is J without the marked addends, that is, 

J = (^-D(^ 2 -D Ul I(CA2)A " (tfA-l)(*C^-l) Vl I(CA2)A 

(jv c - m c \^p c ) _ _ ivc-i (^ cx 



<g Ifr/u^ + t w r • 1,, a, <8> JhrMoV 

m 1 l^ai / «t at . _ -\\rtr . _ i \ ^ . _ i m 1 ioa 2 ; 



(NcNa — 1)(KqK A2 — 1) l °" 2j (JV C JV> - - 1) ^c^a 2 -1 

^Vc-i (V> g |^)(i^ 2 -l) + (^ c ) 
-(JVc^-l)(^-l) X C ^ 2 - 1 ^ I(CA2)A ' 

Sandwiching this expression with X c <gu' 4 removes all other addends that are not supported on C®[i A . That is, we can 
replace once again J' by another expression J" where all those terms are removed. So we have {X c ®u A \ J\X C ®u A ) = 
(X c (g u A \ J"\X C <g u A ), and some simplification yields 

j" - l, A (W^) A ^ - 1 +<p(4P) (Nc-l)(K A -K A2 ) \ 

=:« 

For the final step, we use Lemma [9] and calculate 
/ T(a(T) CA2 )dT = {K C K A2 - 1){X CA2 ®u M \J"\X CA2 ®u M ) = (K C K A2 - 1){X CA2 ® u M \X CA2 ® u M )£ 

JTzQa 

= (K C K A2 - l)£(N Al - 1)(A C ^| ® (M^l ^f ^'7 V CAa ) ® |^) 



'AWVa 2 -i 

Resubstituting £ proves the claim. □ 
Corollary 18 (Decoupling, 2-norm version). The standard assumptions imply 

*(D°*-(^a - p(^) • f N \ 1){N ,f K A - 1 - n^') ■ - \ 

Teg A (NcN A . 2 - ±){K A - 1) (N c N Az - 1) (K A - 1) 



Proof. Substitute the result of Lemma 12 into Theorem [17] □ 
For technical reasons, we have thus far worked with the 2-norm distance, || • ||2- Similarly as in quantum theory, 
this norm is easy to work with, because it is invariant with respect to reversible transformations. However - again, 
in analogy to quantum theory - it does not in itself possess a natural operational interpretation. 

It turns out that one additional assumption allows us to relate the 2-norm to the maximal probability of distin- 
guishing two states, which is the GPT analog of quantum theory's 1-norm, or trace distance. 

Definition 19 (1-norm). For any state space A, and any vector x £ A, define its 1-norm by 

\\x\\i :— 2 max |e(a;)|, 

0<e<u A 

where the maximum is over all effects e with < e < u A , i.e. < e(oj) < 1 for all ui £ Q, A . 

Consequently, the 1-norm of the difference of two states tells us the maximal difference of outcome probabilities of 
any possible measurement that can be applied to the states: 

-||v?-w||i= max \e((p) - e(w)\ (f,Luefl A ). 

i 0<e<u A 

We would like to relate the 2-norm and the 1-norm. The following assumption turns out to be crucial for this. 
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Definition 20. We say that a state space is bit-symmetric, if for every two pairs tp,ui and ip',ui' of perfectly distin- 
guishable states, there is a reversible transformation G such that Gp — <p' and Gui = ui' . 

Bit symmetry is a significant assumption but it has a nice physical interpretation, see |42j . It allows to bound the 
1-norm in terms of the 2-norm in a way which is analogous to quantum theory: 

Theorem 21. Suppose that A is any bit- symmetric state space which satisfies the standard assumptions. Then 

||p - < ^/Na\\p - u;\\ 2 

for all if, u> £ Qa- 

Proof. In [42], it is shown that there is an inner product [•, •] on A with [ui, u>] = 1 for all pure states uj, and [ui, ip] = 
if ui is pure and us, ip are perfectly distinguishable. We give it unusual brackets, because it is in general different to 
the inner product (•, ■) that we have used before in this paper (though closely related). Decomposing vectors x, y G A 
as x = xqIJ, a + x with x € A (and similarly for B), this inner product satisfies 

[x,y] = Xx y Q + (1 - \){x,y) 

for some A e (0,1); now, (•,•) is our usual inner product on Bloch vectors. Now let u>\, . . . ,ujn a be a dynamical 
centered classical subsystem. The states ui\ and ui 2 are perfectly distinguishable; from [34] . we know that (u>i,uj 2 ) = 
-l/(N A - 1). Thus = [wi,w 2 ] = A + (1 — \){uix,U)2) = A - (1 - X)/(N A - 1), which determines A = 1/N A . 

Since the order unit u A as a vector is invariant with respect to all reversible transformations, it must be a multiple 
of the maximally mixed state, u A — cfi A with some c > 0. For the rest of this proof, let ||x||2 := \J[x, x] denote the 
norm which is derived from this new inner product. Then we have 

\\u A \\\ = [u A ,u A ] = [u A ,c^ A ]^c[u A ^ A ]^cu A ^ A )=c. 

On the other hand, 

\\u A \\l = [u A ,u A ] = c 2 [^ A ,v A ] = c 2 • A = c 2 /N A . 

This implies that ||u A ||2 = V^A- Now use the self-duality of A, which implies [13] that there is a decomposition 
ip — ui = R — S into effects R, S > which are orthogonal: [R, S] = 0. Since = u(ip — ui) = u(R — S), we have 
u(R) = u(S). Clearly, uir :— R/u(R) and u>s '■= S/u(S) = S/u(R) are normalized states. Thus, 

l\\<P-u\U = max E(<p-u)=msx E (R - S) = u(R) max (E(ui R ) - E(ui s )) < u(R) = ]-(u(R) + u(S)) 

Z 0<E<u 0<E<u 0<E<u Z 

= \u(R + S) < i||u|| a • \\R + Sh = + W = \^/Na\\R-S\\ 2 = \^a\W ~ w||a- 

□ 

It is easy to see that the inequality — cj||i < \/N A \\ip — ui\\ 2 is false in general if the state space is not bit-symmetric 
or does not satisfy the standard assumptions - in particular, the inequality does not follows from transitivity alone. 
As a simple counterexample, imagine a state space which is a rf-dimensional cube. It is easy to see that this has N = 2 
distinguishable states (it is a generalized bit); moreover, it satisfies all standard assumptions (as a stand-alone state 
space without composites). 

Consider two states ui, ip that are adjacent (i.e. neighboring) pure states of the d-cube. To compute the distance, we 
have to imagine that the cube is inscribed into a unit ball. Then the two states have Euclidean distance \\tp — w|| 2 = 
2/y/d, but they are perfectly distinguishable, i.e. |||p — u;||i = 1. Thus, in this example, \\tp — ui\\i = Vd\\<p — ui\\2, 
where d can be arbitrarily large, while N = 2 is fixed. 

Employing the above inequality on the generalised decoupling theorem we then obtain the following decoupling 
theorem. 

Theorem 22 (Decoupling, 1-norm version). If the subsystem CA 2 is additionally bit-symmetric, then 

\\o-{Uf A >-^®^f x dU < Jf M{K ^~ 1} „ [P^ CA (N C N A - l)-V^ c ){No - 1)] 
ueg A {Nc^a 2 - *-)\. K A - 1) 

£ V{^ GA ) ■ if all N, K large. 

K Al 



